scalarization weight
- Europe > Netherlands > North Holland > Amsterdam (0.40)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States (0.04)
- Telecommunications (0.40)
- Semiconductors & Electronics (0.40)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Netherlands > North Holland > Amsterdam (0.40)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States (0.04)
- Telecommunications (0.40)
- Semiconductors & Electronics (0.40)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Scalarization for Multi-Task and Multi-Domain Learning at Scale
Royer, Amelie, Blankevoort, Tijmen, Bejnordi, Babak Ehteshami
Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone hence improves model efficiency. It also enables potential positive knowledge transfer across tasks/domains, leading to improved accuracy and data-efficient training. However, optimizing such networks is a challenge, in particular due to discrepancies between the different tasks or domains: Despite several hypotheses and solutions proposed over the years, recent work has shown that uniform scalarization training, i.e., simply minimizing the average of the task losses, yields on-par performance with more costly SotA optimization methods. This raises the issue of how well we understand the training dynamics of multi-task and multi-domain networks. In this work, we first devise a large-scale unified analysis of multi-domain and multi-task learning to better understand the dynamics of scalarization across varied task/domain combinations and model sizes. Following these insights, we then propose to leverage population-based training to efficiently search for the optimal scalarization weights when dealing with a large number of tasks or domains.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)